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1.0.  ABSTRACT 

Automatic  seismogram  interpretation  and  onset  estimation  are  desirable  goals  facilitating  the  rapid  location  and 
discrimination  of  seismic  events,  automation  also  relieves  the  human  operator  from  onerous  repetitive  tasks.  At 
regional  distances  the  onsets  of  seismic  phases  are  associated  with  gradual,  rather  than  sudden,  changes  in  the 
statistical  properties  of  the  seismic  time  series,  i.e.  changes  in  autoregressive  models  and  mean  square 
amplitudes.  The  work  presented  combines  two  aspects  of  the  onset  time  estimation,  the  detection  of  the  onset  as 
a  change  in  the  trend  of  the  cumulative  sum  (CUSUM)  of  a  suitable  test  statistic,  and  a  randomized  search  for 
its  location,  simulated  annealing  (SA).  The  latter  is  applied  repeatedly,  thus,  resulting  in  multiple  onset  time 
estimates.  These  typically  form  clusters  around  the  arrival  times  of  the  known  seismic  phases,  but  there  are 
some  scattered  values  as  well,  that  have  to  be  discarded.  The  uncertainties  in  the  onset  times  estimated  can  be 
quantified  in  terms  of  the  spread  in  the  values  of  the  best  defined  clusters.  Various  standard  cluster  analysis 
methods  can  be  applied  to  define  individual  clusters.  The  system  is  presently  developed  using  MATLAB, 
utilizing  its  graphical  user  interface  capabilities. 

2.0.  OBJECTIVES 

The  objective  of  this  work  is  to  develop  automated  methods  for  interpreting  regional  and  teleseismic  short 
period  seismograms  using  ideas  and  algorithms  from  the  statistical  literature.  In  the  analyses  of  seismic  data, 
onset  times  of  the  various  seismic  arrivals  are  of  great  importance,  because  they  are  used  in  locating  the  sources 
and  segmenting  the  seismograms  into  wave  groups  used  in  discrimination  of  various  types  of  events.  Although 
various  onset  time  (or  change  point)  estimation  methods  have  been  tested  in  the  past,  with  success,  for  time 
series  where  the  properties  (amplitudes  and  frequency  spectra)  suddenly  change,  actual  seismograms  are  more 
complex.  Amplitudes  increase  and  decrease  and  spectra  of  various  arrivals  may  be  different.  Because  of  the 
great  variability  of  the  seismic  data,  it  is  necessary  to  make  the  methodology  work  reliably  under  varying 
conditions.  It  is  likely  that  several  approaches  that  have  been  tried  may  have  to  be  combined  to  achieve 
reliability  and  robustness  to  achieve  these  goals. 

3.0.  RESEARCH  ACCOMPLISHED 

3.1.  Background. 

Automatic  phase  arrival  time  estimation  for  regional  arrivals  is  of  considerable  interest  because  of  the  need  for 
rapid  location  and  identification  of  numerous  seismic  events  by  networks  monitoring  natural  and  man-made 
seismic  activity.  Times  of  seismic  ‘phase’  arrivals  can  be  defined  as  time  instants,  where  some  visible 
characteristic,  such  as  amplitude,  frequency  content  or  wave  polarization  changes  in  some  recording.  Typically, 
regional  arrivals  are  high  frequency,  broadband,  emergent  wave  groups  containing  numerous  cycles.  Generally, 
later  arrivals  have  no  clear,  impulsive  waveforms  and  are  preceded  by  the  codas  of  earlier  ones  and  their  onset 
times  can  only  be  defined  to  within  a  few  cycles.  Besides  locating  events  from  multiple  Pn  times  at  several 
stations,  onset  time  estimation  is  useful  in  facilitating  location  using  multiple  arrivals  in  the  same  seismogram 
and  in  automated  application  time  and  frequency  domain  discriminants. 

The  methods  developed  in  this  paper  differ  from  the  semi-automatic  computer  guided  onset  time  determination 
methods  presently  used  by  analysts  in  current  systems,  in  which  preliminary  phase  onset  predictions  suggest 
picks  to  the  analysts.  We  find  that  these  suggested  onset  times  often  take  precedence  over  what  the  analysts 
actually  sees  in  the  seismogram  leading  to  biased  or  censored  onset  times.  What  we  are  trying  to  do  is  to  teach 
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the  computer  to  pick  multiple  phase  onsets  based  on  the  seismograms  only,  as  if  it  were  the  first  time  in  an 
unknown  region. 


There  are  numerous  statistical  approaches  in  the  literature  to  determining  the  arrival  time  of  ‘phases’.  Onsets  of 
packets  of  seismic  energy  associated  with  various  paths,  i.e.  seismic  phases,  are  usually  associated  with  sudden 
changes  in  amplitudes,  spectra,  polarization  and  slowness.  Generally  it  is  assumed  that  it  is  known  that  the 
change  occurs  somewhere  in  a  predefined  time  interval  and  that  the  statistical  model  of  the  time  series  changes 
abruptly.  It  is  also  assumed  that  the  time  series  are  stationary  before  and  after  such  a  change  and  that  the 
statistical  distributions  are  multivariate  Gaussian  (e.g.  Basseville  and  Nikiforov  1993). 

Depending  on  the  autoregressive  models  applied  one  may  test  for  spectral  and  amplitude  changes  (single 
channnel  model),  slowness  changes  (multichannel  model  applied  to  array  sensor  outputs)  or  polarization  (three- 
channel  model)  as  described  by  several  workers  (Kushnir  1996,  Leonard  and  Kennett  1999).  The  model  fitting 
can  be  performed  for  each  guess  of  the  onset  time  (Taylor  et  al  1992,  Takanami  1991)  or  the  fit  to  two  fixed 
models  fitted  at  each  side  of  the  possible  arrival  times  may  be  used  (Kvaerna  1966a,b).  The  minimum  of  the 
summed  squared  residuals  occurs  when  two  models  are  fitted  such  that  the  boundary  of  the  fitting  regions 
correspond  to  the  onset  time.  In  any  case,  the  estimation  process  decreases  the  degrees  of  freedom  in  the 
process,  thus  decreasing  the  stability  and  its  sensitivity.  Common  to  all  these  methods  is  the  assumption  that  the 
time  interval  where  the  phase  onset  occurs  is  known  and  there  are  sufficient  segments  of  the  time  series 
preceding  and  following  the  onset  for  effect  the  estimation  models.  The  approach  described  above  is  quite 
general  and  is  applicable  to  many  kinds  of  problems  involving  changes  in  time  series. 

Subtle  changes  in  the  frequency  content  in  noisy  records  often  give  clue  to  a  human  indicating  the  onset  of  a 
seismic  phase  arrival.  On  the  other  hand,  when  such  human  capabilities  are  needed  it  simply  indicates  a  failure 
of  applying  appropriate  frequency  domain  prefiltering  that  could  produce  an  enhanced  amplitude  contrast 
between  the  noise  and  the  arrival.  Appropriate  prefiltering  in  frequency  is  a  prerequisite  for  further  onset  time 
determination  regardless  of  the  method  (Kvaerna  1996a,b).  Various  schemes  for  optimum  pre-filtering  were 
suggested  and  most  of  these  seem  to  work  well  in  practice  and  the  exact  nature  of  filter  is  not  critical  as  long  as 
they  are  reasonably  close  to  the  optimum  S/N2  shape  (averages  of  signal  amplitude  spectrum  divided  by  the 
squared  noise  amplitude  spectrum).  Minimum  phase  filters  based  on  Kvaerna  ‘s  definition  of  “usable 
bandwidth”  (Kvaerna  1995,  1996a),  as  well  as  those  shaped  according  to  S/N2  ,  S2/(S2+N2)  the  latter  is  a 
Wiener  filter  (Whalen  1971),  where  S  and  N  are  signal  and  noise  spectral  amplitudes  respectively,  or  noise- 
adaptive  predictive  filtering  all  enhance  the  amplitude  contrast  between  the  noise  background  preceding  the 
onset  of  the  signal. 

This  happens  because  both  regional  seismic  signals  and  noise  have  similar  spectral  shapes  in  the  3-10  Hz  band, 
in  which  they  both  fall  off  with  frequency.  These  statements  seem  to  be  generally  true  for  typical  regional 
seismograms.  Thus  after  suitable  prefiltering  has  been  performed,  the  problem  reduces  to,  for  most  practical 
purposes,  to  a  detection  of  a  sudden  or  gradual  change  in  trace  amplitude  levels. 

3.2.  Some  CUSUM-based  onset  time  estimation  methods. 

A  step-like  change  in  amplitude  levels  can  be  detected  easier  if  a  cumulative  sum  (CUSUM)  of  the  squared  or 
absolute  amplitudes  it  is  computed  (Der  and  Shumway  1999,  Iclan  and  Tiao  1994,  Basseville  and  Nikiforov 
1993,  Shumway  1998).  After  imposing  a  linear  trend  to  the  CUSUM,  a  repetitive  onset  time  estimation  is 
performed,  locating  the  minima  or  maxima  of  various  chosen  functions,  by  simulated  annealing.  The  clustering 
in  the  multiple  onset  time  estimates  provides  a  means  to  assess  the  reliability  of  the  process  and  the  significance 
of  each  major  arrival.  In  practical  tests,  the  method  has  outperformed  those  based  on  testing  for  changes  in 
autoregressive  models. 

The  approaches  based  on  the  cumulative  sum  statistics  are  simple  to  apply  and  can  be  automated  for 
applications  in  a  CTBT  monitoring  environment.  Inclan  and  Tiao  (1996)  developed  a  properly  centered  and 
normalized  CUSUM  test  statistic  and  tabulated  critical  points  for  testing  of  significance.  Furthermore,  they 
developed  an  Iterative  Cumulative  Sum  of  Squares  (ICSS)  algorithm  that  allows  sequential  identification  of 
multiple  changepoints  in  a  white  noise  series.  In  this  work  we  have  tested  their  approach  on  a  set  of  regional 
seismic  data. 


Suppose  first  that  a  single  channel  time  series  xKy  K=l,2,....n  is  observed  and  we  have  a  possible  change  point 
that  is  to  be  detected  using  a  CUSUM  type  statistic.  Assume  that  the  time  series  v  has  been  pre whitened  and 
has  a  zero  mean,  so  that  the  change  in  regime  can  be  modeled  simply  by  a  change  in  the  variance  of  the  white 
noise  process.  Inclan  and  Tiao  (1996)  proposed  using  the  centered  and  scaled  cumulative  sum  of  the  squared 
amplitudes.  First  define  the  sum  of  squares  function  over  the  interval  [1  k]  of  length  points  as 

q =i>,2  a) 

i= 1 


The  scaled  and  normalized  CUSUM  statistic  over  the  interval  [1,T]  at  the  point  1<k<T is  defined  as 


Ck  K 
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(2) 


The  F  statistic  for  testing  for  a  change  of  variance  at  time  k  is 

h-„  =  ((CT  -  C  J  /  (T  -  k ))/  (C„  Ik )  (3) 


This  is,  of  course  the  standard  power  detector.  With  no  change  in  variance  in  the  time  interval  1<k<T  ,  D(k)  is 
a  monotone  function  of  t.  If  the  variance  increases  D(  k)  will  have  a  maximum  at  the  change  point. 


If  we  assume  that  the  xt  are  normally  distributed  with  mean  0  and  variances  a  then  we  can  obtain  the 
likelihoods  for  testing  one  change  against  no  change  and  let  NT=1 
represent  one  change.  The  likelihood  for  NT=0  is 


l(NT=  0;x)  =  -|log(2^)-|log[cr  /r]-| 

(4) 


The  likelihood  function  for  NT=1, 

l{t , N T=Vyx)  =  -|log(2/r)  -|log [Ck!k] 


change  at  point  t  is 
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The  best  estimate  of  the  change  point  is  where  the  likelihood  ratio  is 
maximized 

log  l(k, <jf  ,C722 )  «  —  max  j  - 1 log^l +^D^j~ log ^  1  - ^  j| 

(6) 

The  function  puts  more  weight  on  the  middle  of  the  time  series.  This  is  not  a  serious  disadvantage  since  prior  to 
applying  the  algorithm  one  has  a  fairly  good  idea  where  the  main  regional  phases  are  (from  standard  detection 
algorithms  and  F-K  analyses).  All  the  previous  formulas  assume  that  the  time  series  were  pre-whitened.  In 
doing  F-tests  with  bandwidth  limited  data  the  degrees  of  freedom  used  in  the  assumed  F  distributions  should  be 
appropriately  reduced  to  account  for  the  limited  bandwidth. 


The  paper  by  Chen  and  Gupta  (1997)  uses  a  corrected  form  of  AIC,  say 


AIC(C(k)) 

(7) 


2 log  l[k,oI ,al)  + 
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for  2<k<T-2 ,  and  chooses  x  as  a  minimizer  of  AIC(C(k))  by  introducing  the  second,  penalty  function  term. 
Constructing  an  unbiased  estimator  leads  to  equation  (7)  above  (Hurvich  and  Tsai  1989).  The  AIC  likelihood  is 
claimed  to  perform  better  if  the  break  point  is  close  to  the  ends  of  the  segment,  a  situation  that  we  try  to  avoid, 
however. 


A  binary  iterative  version  of  the  CUSUM  algorithm  was  proposed  by  Inclan  and  Tiao  (1994)  and  Chen  and 
Gupta  (1997)  which  uses  repetitive  applications  of  the  approach  described  above  for  subintervals  of  the  first 
time  interval.  The  method  applies  the  following  steps; 

1)  Calculate  D(k)  from  the  start  to  the  endpoint  of  the  initial  time  segment 

2)  Search  for  a  significant  maximum  as  defined  by  equation  (3).  If  one  is  found  at  Ki  then 

3)  Search  the  time  interval  for  another  significant  maxima  (that  pass  an  F-test) 

4)  Similarly  search  the  Kj<7f<T  intervals  for  a  significant  maximum 

5)  Continue  the  same  procedure  in  each  subinterval  until  no  more  significant  maxima  are  found. 

3.3.  Basic  building  blocks  of  the  CUSUM-based  onset  time  estimation  system 

In  order  to  apply  any  of  the  search  methods  for  locating  onsets,  first  one  needs  to  segment  the  seismograms  such 
that  each  of  the  segments  contain  only  one  arrival.  Typically,  a  regional  seismogram  contains  multiple  arrivals 
followed  by  either  a  coda  where  the  amplitude  decays  with  time  or  remains  nearly  constant. 

Moreover,  the  minima  of  the  variants  of  the  CUSUM  functions  described  above  will  provide  reasonable  onset 
time  estimates,  additional  refinements  can  greatly  improve  the  results.  The  exact  locations  of  such  absolute 
minima  may  be  accidental  and  thus  may  be  of  lesser  importance  than  the  overall  layout  of  the  regions  of 
minima,  whether  they  are  narrowly  localized  or  broad.  The  latter  case  implies  greater  uncertainties  in  the  onset 
times,  while  the  former  a  lesser  uncertainty.  It  is  therefore  desirable  to  use  stochastic,  randomized  search 
methods  repeatedly  and  use  the  multiple  onset  time  picks  to  estimate  more  stable  onset  times  and  their 
uncertainties. 


3.3.1.  Leaky  integration 

Applying  leaky  integration  to  the  absolute  trace  amplitude  generates  a  function  which  has  maxima  close  to  the 
trace  maxima  but  delayed  in  time.  This  function  is  useful,  therefore,  for  segmenting  the  seismogram  between 
such  maxiam  such  that  each  segment  will  contain  one  sudden  increase  of  trace  amplitude  envelopes  (a  phase 
onset). 


3.3.2.  F  tests. 

F  tests  are  standard  tools  for  verifying  the  arrival  of  a  new  package  of  energy  from  the  source,  i.  e.  a  phase 
arrival.  In  this  work  we  pick  the  arrival  time  from  the  modified  CUSUM  D(  k)  and  test  whether  the  variances  on 
both  sides  of  this  arrival  are  indeed  different.  In  performing  an  F  test  on  seismic  data  that  has  not  been 
pre whitened  one  must  make  sure  that  the  number  of  degrees  of  freedom  is  adjusted  appropriately.  F  statistics  for 
the  assumed  phase  onset  within  one  segment  must  pass  a  significance  test  before  the  onset  is  accepted. 
Otherwise  the  segment  is  declared  as  containing  no  onsets. 

3.3.3.  Simulated  annealing. 

Simulated  annealing  (SA)  was  designed  to  find  global  minima  of  irregular  functions  where  many  local  minima 
may  exist.  It  tends  to  disregard  minor  local  minima  and  converge  to  the  lowest  points.  It  uses  the  randomized 
Metropolis  search  algorithm  which  is  based  on  a  thermodynamic  analogy  (Press  et  al  1986).  Initially,  it  allows  a 


search  using  large  steps  in  the  independent  variables,  which  may  even  be  associated  with  increased  values  of  the 
function.  This  allows  the  solution  to  “jump  out”  from  local  minima  and  resume  searching  for  other  minima.  As 
“cooling”  occurs  such  steps  are  accepted  less  and  less  and  finally  the  solution  will  settle  in  broad  global 
minima.  Repeated  application  of  SA  with  random  starting  points  will  give  rise  to  populations  of  onset  time 
estimates  that  can  be  used  for  evaluation  of  the  efficiency  and  accuracy  of  the  method. 

3.3.4.  Cluster  analysis 

The  multiple  application  of  SA  searches  automatically  provides  means  for  assessing  the  stability  and  accuracy 
of  the  onset  estimates  derived  from  the  SA  method.  Starting  out  with  numerous  randomly  chosen  positions  for 
the  onset  time  within  a  search  window  these  will  converge  into  positions  of  prominent  CUSUM  minima  and 
form  a  varying  number  of  tight  clusters.  Besides  these  there  will  be  hopefully  much  fewer  scattered  ‘solutions’ 
that  are  obviously  spurious  and  thus  must  be  discarded.  This  approach  was  not  pursued  much  in  our  recent 
work  but  it  has  a  considerable  potential  (Der  and  Shumway  1999).  We  apply  the  well  known  K-means 
clustering  algorithm  (Tou  and  Gonzalez  1974,  Theodoris  and  Koutroumbas  1999)  for  finding  the  clusters  in  a 
population  of  SA  onset  times  or  trace  amplitude  maxima  in  this  report. 

3.4  Flow  diagram  of  the  process. 

Figure  1  shows  two  ways  such  a  process  may  be  put  together  from  its  building  blocks.  One  applies  the  SA 
algorithm,  the  other  does  not.  The  F-test  makes  the  decision  whether  in  a  given  segment  a  phase  onset  is 
declared  or  not.  The  one  involving  SA  is  more  refined,  but  it  takes  more  time  for  computations.  The 
optimization  of  the  whole  process  requires  optimum  choices  in  the  numerous  parameters  required.  Such  are 
those  appropriate  to  the  prefiltering  methods,  the  leaky  integration  constants,  the  temperatures  and  cooling  rates 
and  number  of  trials  for  SA,  specifications  needed  for  cluster  analysis.  Optimizing  such  parameters  requires 
the  testing  of  the  algorithms  on  large  amounts  of  data  and  minimizing  the  number  of  false  or  missed  onset  times 


> 

Onset  times 
end  eiroE 


No  arrival 


Figure  1.  Flow  diagram  of  the  process. 

3.4.  PRACTICAL  EXAMPLES  OF  THE  APPLICATIONS  OF  THE  METHODOLOGIES 
DESCRIBED  ABOVE. 

3.  4.1.  Performance  of  the  CUSUM  Procedure  for  Estimating  Pn  Onset  Times. 

During  the  early  phase  of  this  work  we  have  made  evaluations  of  the  performance  of  the  CUSUM  minimum 
and  the  combined  CUSUM-SA  procedure  for  picking  onset  times  of  first-arriving  Pn  phases  under  various  S/N 
scenarios.  The  evaluation  was  based  on  comparing  the  performance  of  a)  human  analysts  b)  picking  the 
absolute  minimum  of  normalized  CUSUM  and  keeping  the  arrival  if  the  F  tests  is  passed  c)  picking  the  median 
of  multiple  picks  using  SA  on  the  CUSUM  and  taking  the  median  of  these,  but  discarding  the  result  if  the  SA 
arrivals  are  highly  scattered. 

The  results,  reported  at  the  annual  research  review  meeting  last  year  and  in  a  subsequent  annual  report,  will  not 
be  repeated  here.  We  simply  state  that  the  performance  of  the  second  of  these  options  was  comparable  to  a 
human  analysts  picking  the  initial  Pn  arrival  times.  The  module  for  Pn  onset  time  estimation  will  be  tested  more 


extensively  in  the  near  future. 


3.4.2.  Application  of  the  binary  iterative  version  of  normalized  CUSUM  method  to  Segment  Complete 
Regional  Seismograms. 

The  iterative  binary  segmentation  method  method  of  Inclan  and  Tiao  (1996)  was  applied  to  many  regional 
seismograms.  Typically,  three  iterations  were  used.  During  the  first  iteration,  the  largest  arrival  was  commonly 
picked.  In  the  succeeding  iterations  the  smaller  arrivals  were  identified.  Although  this  binary  segmentation 
approach  works  well  for  events  where  the  various  arrivals  have  grossly  unequal  amplitudes,  it  fails  in  many 
cases  where  there  are  several  arrivals  with  comparable  amplitudes.  Therefore,  we  need  more  general 
segmentation  methods  while  retaining  much  of  the  statistical  testing  machinery  contained  in  the  papers  of 
Inclan  and  Tiao  (1996)  and  Chen  and  Gupta  (1997). 

3.4.3.  Methods  based  on  seismogram  segmentation. 

The  approach  proposed  below  seems  to  work  much  better  than  the  binary  segmentation  methods.  In  order  to 
find  the  rapid  increases  in  amplitude  we  have  leaky-integrated  the  absolute  trace  amplitudes  of  the  seismogram. 
The  maxima  of  traces  thus  processed  will  be  located  at  times  later  than  the  maximum  amplitude  portions  of  the 
respective  phases  due  to  the  time  delay  introduced  by  the  leaky  integral  low-pass  filter.  If  we  segment  the 
seismogram  from  beginning  to  end  using  these  maxima  as  the  intermediate  point  then  the  actual  onsets  will  be 
interior  to  these  segments.  If  we  locate  the  minima  of  the  functions  D( k)  within  these  initial  segments  then  we 
shall  have  the  approximate  locations  of  the  phase  onsets.  Now,  if  we  create  windows  that  are  centered  on  this 
preliminary  onset  time  with  the  end  points  at  the  maxima  of  the  leaky-integrated  absolute  amplitudes. 

We  have  tested  two  methods  for  finding  the  broad  maxima.  One  method  consisted  of  the  monitoring  the 
amplitude  differential  (gradient)  in  the  leaky-integrated  absolute  amplitude  trace  over  a  set  time  interval 
(typically  tied  to  the  time  constant  of  the  leaky  integrator),  finding  a  maximum  and  set  the  segment  separation 
such  that  this  gradient  decreased  by  a  set  factor  (typically  0.9).  Automatic  onset  time  picks  for  seismograms 
segmented  this  way  are  shown  in  Figure  2.  The  examples  shown  were  taken  from  the  Ground  Truth  data  base 
compiled  by  Multimax  Inc.  and  downloaded  through  the  Internet.  Many  of  these  seismograms  are  not  strictly 
regional,  some  distances  are  teleseismic.  Consequently  some  broader  bandpass  prefiltering  was  needed  to 
enhance  these  arrivals. 


ARU7021  BP  1-4 


NIL7021  BP  1 .5-7 


BGCA7100  BP  .3-3 


Figure  2  continued 
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Figure  2.  Examples  of  the  decomposition  of  seismograms  into  segments  bounded  by  the  decreasing 
portions  of  the  maxima  of  the  leaky-integrated  absolute  amplitudes  (second  trace.  This  is  followed  by 
looking  for  the  minima  in  the  function  D(t)  of  Incl  n  and  Tiao  (1994)  in  each  segment  (third  trace). 
This  is  followed  by  the  redefinition  of  shorter  segments  designed  to  be  centered  on  this  minimum  be 
bounded  by  the  previous  right  side  of  the  segment  (shorter  segments  below).  This  is  followed  by  an 
F-test  around  the  minimum  to  determine  whether  the  change  in  amplitudes  is  significant.  An  onset 
(vertical  lines)  is  declared  if  this  is  the  case.  Such  onsets  can  be  further  refined  by  actually  defining 
the  onsets  as  maxima  of  F  or  the  statistics  of  Chan  and  Gupta  (1997). 


Another  approach  we  have  tried  for  finding  the  broad  maxima  was  based  on  a  cluster  analysis  of  multiple 
solutions  produced  by  simulated  annealing  (SA)  .  Since  our  SA  program  was  designed  for  finding  minima, 
we  have  simply  reversed  the  sign  of  the  leaky-integrated  trace  for  such  computations.  The  centers  of  the 
clusters  for  the  picks  of  minima  thus  produced  can  be  used  as  limits  of  the  segments  to  be  searched  for 
single  increases  in  amplitude.  Note  that  this  procedure  is  the  same  as  the  one  we  have  applied  for  finding 
onset  times  previously  (Der  and  Shumway  1999).  Illustrations  of  the  results  of  such  procedures  are  shown 
in  Figure  3.  We  have  found  that  this  method  was  somewhat  more  robust  than  the  previous  one. 
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Figure  3.  Examples  of  the  decomposition  of  seismograms  (top  trace)  into  segments  bounded  by  the 
clusters  of  the  SA  picks  (plus  signs)  of  the  mimima  of  the  sign-inverted  leaky-integrated  absolute 
amplitudes  (second  trace),  followed  by  looking  for  the  minima  in  the  function  D(t)  of  Inclan  and 
Tiao  (1994)  in  each  segment  (third  trace).  This  is  followed  by  the  redefinition  of  shorter  segments 
designed  to  be  centered  on  this  minimum  be  bounded  by  the  previous  right  side  of  the  segment 
(shorter  segments  below).  The  next  step  is  an  F-test  around  the  minimum  to  determine  whether  the 
change  in  amplitudes  is  significant.  An  onsets  (vertical  lines)  are  declared  if  this  is  the  case.  The  first 
letters  in  the  titles  are  the  station  names  and  the  following  digits  are  the  event  ‘orid’-s  (see  data  base 
Schema  3.0  Manual),  referring  to  Table  I.  The  band-pass  filters  used  are  indicated  in  the  rest  of  the 
title. 


It  appears,  however,  that  reliable  segmentation  of  high  frequency  seismograms  and  the  automatic 
determination  of  the  onset  times  of  the  various  arrivals  will  require  sophisticated  algorithms.  The 
algorithms  need  to  have  sufficient  flexibility  to  recognize  arrivals  with  quite  different  amplitudes, 
impulsive  arrivals  without  codas,  phases  with  long  codas  and  phases  where  the  seismogram  essentially 
consists  of  a  series  of  increases  in  amplitude  without  much  decrease  until  the  Lg  coda.  The  latter  is  typical 
of  Kola  peninsula  quarry  blasts  and  the  binary  iterative  segmentation  procedures  of  Inclan  and  Tiao  (1994) 
and  Chen  and  Gupta  (1997)  seem  to  work  well  for  these.  For  seismograms  containing  phases  with  codas 
the  segmentation  methods  using  SA  work  better.  In  actual  implementation  of  such  methods  more  than  a 
single  approach  may  have  to  be  employed  in  interpreting  seismograms  in  the  automatic  mode. 


Since  the  methods  have  been  tested  in  the  interpretative  MATLAB  computing  environment  they  run  slow 
under  MATLAB.  This  is  misleading,  however.  According  to  our  previous  experience  (Der  and  Shumway 
1999),  when  implemented  as  compiled  codes  all  the  algorithms  tested  will  run  much  faster  than  any  human 
analyst  could  read  the  seismograms.  SA  is  a  very  fast  algorithm  because  it  mostly  involves  comparisons  of 
the  magnitudes  of  various  quantities  (Press  et  al  1986)  .  In  this  prototyping  stage  we  have  not  taken  the 
pains,  either,  to  fully  exploit  the  efficient  vectorized  computation  features  in  MATLAB.  In  the  actual 
implementation  MATLAB  codes  can  also  be  compiled  or  linked  to  compiled  C  or  FORTRAN  modules. 

4.0.  CONCLUSIONS  AND  RECOMMENDATIONS. 

CUSUM-based  methods  seem  to  be  quite  suitable  for  processing  regional  seismograms  since  these  consist 
of  long  wavetrains  and  have  emergent  phase  onsets.  Since  CUSUM  methods  emphasize  changes  in  the 
properties  of  signals  over  several  cycles  this  kinds  of  methods  can  be  used  to  segment  regional 
seismograms.  CUSUM-based  methods  to  pick  seismic  phase  onset  times  can  also  be  developed  based  on  a 
variety  of  statistics  that  are  diagnostic  of  polarization,  slowness,  and  spectral  changes  (Jurkevics  1988,  Der 
et  al  1993).  Other  candidates  may  include  instantaneous  relative  phase  differences  among  components, 
adaptive  slowness  estimates  or  their  combination. 

During  the  course  of  previous  work  described  in  Der  and  Shumway  (1999),  and  in  the  work  summarized  in 
this  report,  we  have  developed  the  essential  building  blocks  for  a  practical  implementation  of  an  automatic 
seismogram  interpretation  algorithm.  As  stated  above,  not  all  the  approaches  that  we  have  tried  worked 
equally  well  for  segmentation  of  seismograms  and  onset  estimation  under  the  various  scenarios 
encountered  in  practice.  For  seismograms  that  contain  steplike  increases  in  the  amplitudes  of  the  successive 
phases  binary  iterative  segmentation  procedures  of  Inclan  and  Tiao  (1994)  and  Chen  and  Gupta  (1997) 
seem  to  work  well.  For  seismograms  containing  phases  with  codas  and  variable  amplitudes  the 
segmentation  methods  using  SA  work  better.  Once  the  segments  are  defined  numerous,  previously 
developed  approaches  for  the  determination  of  the  onset  times  described  described  in  the  literature  can  be 
applied  (Inclan  and  Tiao  1994,  Chen  and  Gupta  1997,  Basseville  and  Nikiforov  1993). 

Any  practical  algorithm  package  for  the  automatic  interpretation  of  high  frequency  seismograms  will  have 
to  possess  sufficient  intelligence  to  recognize  various  situations  and  chose  the  best  algorithms.  We  shall 
implement  the  final  code  in  a  MATLAB  environment  complete  with  a  suitable  graphical  user  interface 
(Marchard  1990).  Most  of  the  algorithms  described  have  been  programmed  as  MATLAB  M-files  during 
the  work  described. 
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